batch18 
QC REPORT 
Input files downloaded from:
 /nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/mbrave_batch_data/batch18/ 
Output files are saved to:
 /nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch18/ 

The consensus network .tsv file exists: TRUE 
The fasta file exists: TRUE 
The stample statistics file exists: TRUE 
The negative control statistics file exists: TRUE 
The positive control statistics file exists: TRUE

Statistics for the positive controls

Total number of positive controls: 96 
Number of positive controls per plate: 1 

All plates have positive controls: TRUE  
Total number of reads in positive controls: 29562 
Maximum number of reads: 466 in positive control sample: CONTROL_POS_BAYS_017_G12 
Minimum number of reads: 3 in CONTROL_POS_FACE_201_G12 

Average number of positive control reads: 307.9375 
Median number of positive control reads: 317 
Read standard deviation: 85.9480641832154 

Quantiles:
 5%: 128.5
10%: 191.5
25%: 279
50%: 317
75%: 354.25
95%: 436
100%: 466

Blue solid line: read mean
Orange dotted lines: 5% and 10% lower quantiles

Number of positive control samples in the lower 5% quantile: 5 

 CONTROL_POS_FACE_201_G12
CONTROL_POS_FACE_202_G12
CONTROL_POS_FACE_205_G12
CONTROL_POS_FACE_207_G12
CONTROL_POS_FACE_208_G12 

Names of the associated partners: BIFOR

Statistics for the negative controls

Total number of negative controls: 621 
Total number of lysate negative controls: 525 
Total number of empty negative controls: 96 

Number of negative controls per plate:
Number of negative controls per plate Number of plates
2 87
more than 2 9

All plates have negative controls: TRUE 

Total number of reads in lysate negative controls: 1629 
Total number of reads in empty negative controls: 86 
Maximum number of reads: 292 in lysate negative control sample: CONTROL_NEG_LYSATE_FACE_203_H12 
Maximum number of reads: 5 in empty negative control sample: CONTROL_NEG_BAYS_019_A2 

Zero reads in: 424 negative control samples 
In lysate controls: 381 
In empty controls: 43 

Average number of negative control reads: 2.76167471819646 
In lysate controls: 3.10285714285714 
In empty controls: 0.895833333333333 

Median number of negative control reads: 0 
In lysate controls: 0 
In empty controls: 1 

Skewness number of negative control reads: 11.0469717733312 
In lysate controls: 10.1363579769267 
In empty controls: 1.30385483117931 

Quantiles in lysate controls:
 5%: 0
10%: 0
25%: 0
50%: 0
75%: 1
95%: 3
98%: 15.5599999999999 

Quantiles in empty controls:
 5%: 0
10%: 0
25%: 0
50%: 1
75%: 1
95%: 3
98%: 3.09999999999999

Blue solid line: read mean
Orange dotted lines: upper 5% and 2% of samples with the highers number of reads

Number of negative control samples in the higher 5%: 39 
 Out of in the lysate controls: 30 
 Out of in the empty controls: 9 
 
Number of negative control samples in the higher 2%: 13 
 Out of in the lysate controls: 13 
 Out of in the empty controls: 0 

Names of the associated partners: BAYS, CAMP, BIFOR, LFLA, RRHP, WTPV

Statistics for the samples

Number of samples in the batch (exclusing controls): 8499 
Total number of partner plates: 96 
Total number of sample reads: 2832708 

Maximum number of sample reads: 906 in sample: LFLA_010_A2 
Minimum number of sample reads: 0 in 310 samples
 which is 3.64748793975762 % of all samples 

Average number of reads: 333.298976350159 
Median number of reads: 360 
Read standard deviation: 200.8572183961 
Skewness number of sample reads: -0.170122229845947 

Quantiles:
 5%: 1
10%: 18
25%: 166.5
50%: 360
75%: 489
95%: 636
100%: 906

Blue solid line: read mean
Orange dotted lines: lower 5% and 10% of samples

Number of samples in the lower 10%: 851 out of 8499 samples 
Number of samples in the lower 5%: 461 out of 8499 samples 

Partners associated with the bottom 5% of samples by read count:
Partner names Frequency
RRHP 176
BIFOR 127
LFLA 70
BAYS 57
WTPV 23
CAMP 8

Number of samples with 0 reads: 310

Plate boxplots

Plates where the 75th percentile of the data is lower than expected mean read count (dark grey):

 FACE_006
FACE_010
FACE_011
FACE_013
FACE_017
FACE_020
FACE_201
FACE_202
FACE_203
FACE_204
FACE_205
FACE_206
FACE_207
FACE_208
RRHP_026
RRHP_027
RRHP_061
WTPV_014 
 
which constitutes 18.75 % of all partner plates in this batch

Grey line: median
Brown line: mean
Green data points: positive controls
Blue data points: empty negative controls
Navy data points: lysate negative controls

Plates where the 75th percentile of the data is lower than expected mean read count (dark grey): 4, 7, 8 
How many samples from the low-performance partner plates are present in the low-performance UMI plates (purple data points): 55.9230306674684 %

Assess the positive controls with the low number of reads detected in the previous steps:

FACE_201 Positive control failed.
 Observed number of reads: 3 Expected: 102.161290322581 

FACE_202 Positive control failed.
 Observed number of reads: 55 Expected: 61.8817204301075 

FACE_205 More reads in positive control than in samples on average.
 Observed number of reads: 127 Expected: 50.4086021505376 

FACE_207 More reads in positive control than in samples on average.
 Observed number of reads: 119 Expected: 66.5698924731183 

FACE_208 Positive control failed.
 Observed number of reads: 75 Expected: 118.075268817204 
FACE_205 
FACE_207 
FACE_208 
FACE_201 
FACE_202 
The above plates have lower than expected number of reads 
AND failed positive controls. 
THESE PLATES NEED TO BE EXAMINED FURTHER

Low-quality plates are displayed here. All the other plates are plotted in the last part of this report.
Green squares: controls [any kind]

Assessment of sequence conflicts and contaminants

Positive control as contamination source

NOTE: All sample and sequence IDs match - data successfully merged
Positive control OTU is TAX:1287025 

Non-positve control samples that contain positive control reads:
Sample Control Sequence Count Sequence Similarity Sequence Type UMI Plate ID
BAYS_002_F11 1 99.84779 secondary 9
BAYS_015_B6 1 99.84825 secondary 13
BAYS_019_E8 1 99.84871 secondary 14
LFLA_003_G8 1 99.84848 secondary 19
LFLA_012_A5 1 98.31547 secondary 15
RRHP_026_A4 1 99.84802 secondary 4
RRHP_027_D8 1 100.00000 primary 5
WTPV_020_H2 1 99.69697 secondary 17
Number of samples with positive control OTU as primary sequence: 1 
Number of samples with positive control OTU as secondary sequence: 7 
out of 5838 samples with secondary sequences 

Location of the contaminants relative to the source:

Orange square: positive contros
Green squares: samples with positive control contamination

Read count mean of all secondary sequences in all samples: 4.96586979401594 
Read count mean of all positive control sequences in other samples: 1 

Read count median of all secondary sequences in all samples: 1 
Read count median of all positive control sequences in other samples: 1

Blue solid line: secondary hit read mean
Orange dotted lines: mean of reads found as secondary contaminants from the positive controls in other samples
Both lines should be in close proximity meaning that the secondary contamination from positive controls is comparable to the potential contamination in other samples.

NOTE: Non-control samples with control reads recognised as the primary hit need to be examined further!
Sample Count OTU Sequence
RRHP_027_D8 1 TAX:1287025 primary
NOTE: the above samples are automatically removed if:
  • There’s only one primary read
  • Secondary sequence found in the same sample is not an Arthropod


  • Negative control contamination

    Distribution of reads in negative controls

    NOTE: contamination source can be either primary or secondary sequence within samples!
    Family No. Source Samples
    Chironomidae 73
    Tachinidae 51
    Hominidae 9
    Culicidae 7
    Anthomyiidae 5
    Dolichopodidae 3
    Platygastridae 2
    Agromyzidae 1
    Aleyrodidae 1
    Aphididae 1
    Cecidomyiidae 1
    None 1
    Scelionidae 1
    Tipulidae 1

    Outline: negative controls with contaminants
    Colour of the oultine indicates partners to track the samples between partner and UMI plates.
    Thicker chartreuse outline: FAILED negative controls with contaminants [2%]
    Numbers indicate the read count
    Squares that are not outlined represent potential sources of contamination within plates: identical sequences found within these wells and negative controls.

    Assessment of primary and secondary sequences

    NOTE: Controls are not included! 
    
    Number of wells with a primary sequence only: 2327 
    Number of wells with primary and secondary sequences: 5794 
    
    Number of primary chimeric sequences: 56 
    Number of secondary chimeric sequences: 7600 
    
    NOTE: All secondary chimeric sequences successfully removed
    [1] 6
    Number of samples with only primary chimeric sequence recognised: 28 
    We do not know how mBRAVE recognises chimeras - for now ony samples represented by less than 5 reads get removed
    Retained samples: 6
    Number of EXCLUDED primary sequences: 499 
     which constitutes 6.16125447586122 % of all samples 
    These samples are not being removed - it's an mBRAVE cut-off 
    
    Number of primary sequences with no taxonomy assigned: 608 
     which constitutes 7.5070996419311 % of all samples 
    These samples are going to be examined further

    Number of samples with no taxonomy assigned that will be replaced with the secondary sequence based on the sequence similarity: 150 
    Other sequences with no taxonomy assigned to the primary sequence will remain unchanged.

    If the first entry is not ‘Arthropod’, then the second entry is likely correct [based on manual observations]

    Number of samples with Wolbachia detected: 335 
    
    Table with plate positions, number of reads, and sequences saved to the output directory:
     /nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch18/
    Number of samples with Nematoda, Tardigrada, Annelida, and/or Rotifera detected: 80 
    
    Table with plate positions, number of reads, and sequences saved to the output directory:
     /nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch18/
    Taxon Frequency
    Chordata 20
    Nematoda 9
    Proteobacteria 54
    Rotifera 2
    85 wells had primary non-Arthropod hits and secondary Arthropod hits 
    NOTE: Primary hits are going to be replaced
     
    Samples with only non-Arthropod sequences detected: 29 
    NOTE: These samples have been excluded!
    Number of African Anopheline samples: 50 
    Number of primary African Anopheline hits [250 or more reads]: 24 
    NOTE: All primary mosquito samples removed!

    Number of samples with only primary Arthropod sequence: 5779 
     73.1982267257758 % of all remaining samples 
    
    Number of samples where secondary sequence is not present elsewhere on the partner or UMI plate: 0 
    Number of conflicting sequences [sequences are in different families or orders, both have good read support]: 231
    Primary hit Number
    Arthropoda 7210
    None 531
    Number of retained samples: 7741 
    Number of Arthropod samples assigned by mBRAVE [this inscludes samples with 5 or less reads that have now been excluded!]: 7376 
    Number of samples with replaced sequences: 34 
    Retained chimeras: 20 
    Retained samples with no taxonomy: 531 
    
    Each retreived sample has only one sequence: TRUE
    Number of samples Description Category Decision
    4402 Only one sequence with more than 200 reads, no secondary sequence detected 1 YES
    863 Only one sequence with 50 to 200 reads, no secondary sequence detected 2 YES
    482 Only one sequence with 5 or more but less than 50 reads, no secondary sequence detected 3 YES
    123 Dominant sequence with more than 200 reads, non-conflicting secondary sequences with 5 or less reads 4 YES
    48 Dominant sequence with 50 to 200 reads, non-conflicting secondary sequences with 5 or less reads 5 YES
    671 Dominant sequence with more than 200 reads, conflicting secondary sequences with 5 or less reads 6 YES
    376 Dominant sequence with 50 to 200 reads, conflicting secondary sequences with 5 or less read 7 YES
    279 Dominant sequence with more than 200 reads, secondary sequences with more than 5 read support 8 NO
    177 Dominant sequence with 50 to 200 reads, secondary sequences with more than 5 read support 9 NO
    251 Dominant sequence with 5 or more but less than 50 reads, non-conflicting secondary sequences with less than 5 reads 10 NO
    69 Dominant sequence with more than 5 but less than 50 reads, any other secondary reads present 11 NO
    Decision category Number of samples
    NO 776
    YES 6965
    8.91869631721379 % OF SAMPLES EXCLUDED [all samples]
    18.0491822567361 % OF SAMPLES EXCLUDED [only approved samples]

    Plate heatmaps - retained samples


    NOTE: The heatmaps below show only the retained samples. Controls, chimeric samples, non-Arthropod samples, and samples with no taxonomy assigned have been removed or replaced!

    Final fasta file succesfully saved: /nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch18/BOLD_filtered_sequences_batch18.fasta 
    Final metadata file succesfully saved: /nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch18/BOLDfiltered_metadata_batch18.csv

    The report and output files have been successfully generated!


    Number of retained samples per partner plate
    Plate Original number of samples Number of samples after filtering Percentage
    WTPV_018 93 93 100.000000
    RRHP_063 14 14 100.000000
    WTPV_025 49 48 97.959184
    BAYS_018 93 91 97.849462
    FACE_011 93 91 97.849462
    RRHP_045 93 91 97.849462
    WTPV_016 93 91 97.849462
    WTPV_019 93 91 97.849462
    RRHP_061 93 90 96.774193
    FACE_009 93 89 95.698925
    FACE_010 93 89 95.698925
    LFLA_010 93 89 95.698925
    WTPV_020 93 89 95.698925
    BAYS_022 93 88 94.623656
    WTPV_013 93 88 94.623656
    WTPV_015 93 88 94.623656
    LFLA_014 18 17 94.444444
    BAYS_003 93 87 93.548387
    WTPV_014 93 87 93.548387
    WTPV_022 93 87 93.548387
    LFLA_002 31 29 93.548387
    CAMP_025 75 70 93.333333
    BAYS_014 93 86 92.473118
    CAMP_036 93 86 92.473118
    RRHP_029 93 86 92.473118
    RRHP_048 53 49 92.452830
    FACE_006 93 85 91.397850
    WTPV_021 93 85 91.397850
    WTPV_024 93 85 91.397850
    CAMP_029 93 84 90.322581
    CAMP_034 93 84 90.322581
    CAMP_035 93 84 90.322581
    FACE_012 93 84 90.322581
    FACE_014 93 84 90.322581
    LFLA_001 93 84 90.322581
    RRHP_046 93 84 90.322581
    RRHP_062 93 84 90.322581
    BAYS_002 93 83 89.247312
    BAYS_015 93 83 89.247312
    BAYS_019 93 83 89.247312
    CAMP_031 93 83 89.247312
    FACE_007 93 83 89.247312
    FACE_008 93 83 89.247312
    FACE_013 93 83 89.247312
    RRHP_047 93 83 89.247312
    WTPV_017 93 83 89.247312
    BAYS_020 93 82 88.172043
    FACE_213 93 82 88.172043
    BAYS_001 93 81 87.096774
    CAMP_030 93 81 87.096774
    BAYS_011 93 80 86.021505
    CAMP_033 93 80 86.021505
    LFLA_003 93 80 86.021505
    CAMP_024 74 63 85.135135
    BAYS_008 93 79 84.946237
    WTPV_023 93 79 84.946237
    BAYS_017 93 78 83.870968
    LFLA_007 93 78 83.870968
    BAYS_010 93 77 82.795699
    CAMP_032 93 77 82.795699
    CAMP_037 93 77 82.795699
    FACE_018 93 77 82.795699
    LFLA_008 93 77 82.795699
    LFLA_004 93 76 81.720430
    BAYS_005 93 75 80.645161
    FACE_017 93 75 80.645161
    BAYS_004 93 74 79.569892
    BAYS_016 93 74 79.569892
    FACE_016 93 74 79.569892
    FACE_210 93 74 79.569892
    LFLA_050 93 74 79.569892
    FACE_015 93 73 78.494624
    BAYS_013 93 72 77.419355
    LFLA_005 93 72 77.419355
    LFLA_006 93 72 77.419355
    BAYS_009 93 71 76.344086
    BAYS_012 93 70 75.268817
    FACE_212 93 69 74.193548
    BAYS_006 93 68 73.118280
    LFLA_009 93 66 70.967742
    FACE_208 93 64 68.817204
    FACE_020 93 63 67.741935
    BAYS_007 93 62 66.666667
    FACE_217 93 62 66.666667
    FACE_211 93 61 65.591398
    FACE_204 93 58 62.365591
    FACE_202 93 56 60.215054
    FACE_205 93 55 59.139785
    FACE_203 93 54 58.064516
    FACE_207 93 48 51.612903
    FACE_201 93 47 50.537634
    FACE_218 12 6 50.000000
    LFLA_012 93 45 48.387097
    FACE_206 93 44 47.311828
    RRHP_026 93 3 3.225807
    RRHP_027 82 2 2.439024

    Number of retained samples per partner
    Partner Original number of samples Number of samples after filtering Percentage
    WTPV 1165 1094 93.90558
    CAMP 986 869 88.13387
    BAYS 1953 1644 84.17819
    LFLA 1072 859 80.13060
    FACE 2523 1913 75.82243
    RRHP 800 586 73.25000

    Number of retained samples per UMI plate
    Plate Original number of samples Number of samples after filtering Percentage
    17 372 356 95.69892
    16 372 354 95.16129
    2 372 353 94.89247
    6 332 306 92.16867
    14 372 337 90.59140
    18 372 336 90.32258
    24 372 331 88.97849
    1 353 314 88.95184
    13 372 326 87.63441
    23 372 321 86.29032
    3 372 314 84.40860
    9 354 298 84.18079
    19 328 276 84.14634
    12 372 308 82.79570
    10 372 304 81.72043
    22 291 234 80.41237
    20 372 293 78.76344
    21 372 293 78.76344
    11 372 289 77.68817
    5 361 263 72.85319
    15 235 165 70.21277
    4 372 218 58.60215
    7 293 171 58.36177
    8 372 205 55.10753

    The plates with low number of reads and retained samples should be examined!

    Samples to examine manually

    Failed negative controls [2%] with contamination other than Bovidae:
    
     CONTROL_NEG_LYSATE_BAYS_005_H12
    CONTROL_NEG_LYSATE_BAYS_007_H12
    CONTROL_NEG_LYSATE_BAYS_008_H12
    CONTROL_NEG_LYSATE_BAYS_010_H12
    CONTROL_NEG_LYSATE_CAMP_025_B1
    CONTROL_NEG_LYSATE_CAMP_025_B2
    CONTROL_NEG_LYSATE_FACE_203_H12
    CONTROL_NEG_LYSATE_FACE_218_A8
    CONTROL_NEG_LYSATE_FACE_218_B7
    CONTROL_NEG_LYSATE_FACE_218_E8
    CONTROL_NEG_LYSATE_FACE_218_F8
    CONTROL_NEG_LYSATE_RRHP_027_F11
    CONTROL_NEG_LYSATE_WTPV_017_H12
    
    These samples may have insects in them!

    Plate heatmaps - all [partner and UMI plates]